National Repository of Grey Literature 11 records found  1 - 10next  jump to record: Search took 0.01 seconds. 
Preprocessing and Transformation of Text Data Collections
Maruna, Viktor ; Burget, Radek (referee) ; Bartík, Vladimír (advisor)
This bachelor thesis deals with the issue of text-mining, mostly focused on preprocessing and transformation. In theoretical part there are contained information about development and principles of text-mining processes, text data collections and use in practice. The next part of this thesis describes in detail single steps of preprocessing and transformation of text data collections. In the final parts there are reviews of application development, testing and personal view on this thesis.
Gender recognition from the text data
Mačát, Jakub ; Burda, Karel (referee) ; Červenec, Radek (advisor)
This bacheor`s work is focused on gender identification from a text just from an e-mail`s form and also contemporary techniques of data mining and text mining. The technique`s advantages and disadvantages and options of use. There was realized a program for recognizing gender in Java. In a program Rapid Miner is demostrated processing various learning methods. By both programs thete are described their basic attributes, used methods and operators used in the implementation. The programs were tested ona real data. Then there are mentioned methods for program`s extends. eventually there are given examples as the programs process stated assignment.
Word Sense Clustering
Jadrníček, Zbyněk ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This thesis is focused on the problem of semantic similarity of words in English language. At first reader is informed about theory of word sense clustering, then there are described chosen methods and tools related to the topic. In the practical part we design and implement system for determining semantic similarity using Word2Vec tool, particularly we focus on biomedical texts of MEDLINE database. At the end of the thesis we discuss reached results and give some ideas to improve the system.
Application for Text Summarization
Mička, Jakub ; Zendulka, Jaroslav (referee) ; Bartík, Vladimír (advisor)
This work is focused on an implementation a web application, which is a tool for automatic English text summarization. In result, automatic text summarization is made by TextRank and Latent semantic analysis method. Both of these methods are improved by named entity recognition. The main benefit of this work is proving that using the named entity recognition with Latent semantic analysis and especially with TextRank method leads to creation of higher quality summaries. This quality of the summaries was verified by ROUGE metrics.
Tax aspects of tokenization from the Perspective of Czech and foreign legislation
Komorous, Jiří ; Sejkora, Tomáš (advisor) ; Kotáb, Petr (referee)
Tax aspects of tokenization from the Perspective of Czech and foreign legislation Abstract in English The aim of this thesis is to provide a comprehensive overview of tax obligations related to the tokenization process using Distributed Ledger Technology from the perspective of Czech law, analyze problematic areas of applicable tax legislation and suggest potential changes of current tax law. This diploma thesis also aims to provide a comparative view of the taxation of the tokenization process in selected countries of the world and thus evaluate the different tax obligations from the perspective of the tax subject. The first part of the thesis first briefly introduces cryptoactive assets and then describes their legal nature. Furthermore, this section discusses the definition of cryptoassets in relation to cryptocurrencies and the definition of the term token. The second part is focused on a closer analysis of the tokenization process and the tokens resulting from it. The focus is mainly on the classification of tokens and the comparison of different approaches to classification by different jurisdictions. Depending on the purpose of tokenization, tokens of different legal nature with different tax obligations are issued, therefore it is crucial to define the types of issued tokens and determine their...
Rychlý a trénovatelný tokenizér pro přirozené jazyky
Maršík, Jiří ; Bojar, Ondřej (advisor) ; Spousta, Miroslav (referee)
In this thesis, we present a data-driven system for disambiguating token and sentence boundaries. The implemented system is highly configurable and versatile to the point its tokenization abilities allow to segment unbroken Chinese text. The tokenizer relies on maximum entropy classifiers and requires a sample of tokenized and segmented text as training data. The program is accompanied by a tool for reporting the performance of the tokenization which helps to rapidly develop and tune the tokenization process. The system was built with multi-platform libraries only and with emphasis on speed and correctness. After a necessary survey of other tools for text tokenization and segmentation and a short introduction to maximum entropy modelling, a large part of the thesis focuses on the particular implementation we developed and its evaluation.
Application for Text Summarization
Mička, Jakub ; Zendulka, Jaroslav (referee) ; Bartík, Vladimír (advisor)
This work is focused on an implementation a web application, which is a tool for automatic English text summarization. In result, automatic text summarization is made by TextRank and Latent semantic analysis method. Both of these methods are improved by named entity recognition. The main benefit of this work is proving that using the named entity recognition with Latent semantic analysis and especially with TextRank method leads to creation of higher quality summaries. This quality of the summaries was verified by ROUGE metrics.
Rychlý a trénovatelný tokenizér pro přirozené jazyky
Maršík, Jiří ; Bojar, Ondřej (advisor) ; Spousta, Miroslav (referee)
In this thesis, we present a data-driven system for disambiguating token and sentence boundaries. The implemented system is highly configurable and versatile to the point its tokenization abilities allow to segment unbroken Chinese text. The tokenizer relies on maximum entropy classifiers and requires a sample of tokenized and segmented text as training data. The program is accompanied by a tool for reporting the performance of the tokenization which helps to rapidly develop and tune the tokenization process. The system was built with multi-platform libraries only and with emphasis on speed and correctness. After a necessary survey of other tools for text tokenization and segmentation and a short introduction to maximum entropy modelling, a large part of the thesis focuses on the particular implementation we developed and its evaluation.
Gender recognition from the text data
Mačát, Jakub ; Burda, Karel (referee) ; Červenec, Radek (advisor)
This bacheor`s work is focused on gender identification from a text just from an e-mail`s form and also contemporary techniques of data mining and text mining. The technique`s advantages and disadvantages and options of use. There was realized a program for recognizing gender in Java. In a program Rapid Miner is demostrated processing various learning methods. By both programs thete are described their basic attributes, used methods and operators used in the implementation. The programs were tested ona real data. Then there are mentioned methods for program`s extends. eventually there are given examples as the programs process stated assignment.
Preprocessing and Transformation of Text Data Collections
Maruna, Viktor ; Burget, Radek (referee) ; Bartík, Vladimír (advisor)
This bachelor thesis deals with the issue of text-mining, mostly focused on preprocessing and transformation. In theoretical part there are contained information about development and principles of text-mining processes, text data collections and use in practice. The next part of this thesis describes in detail single steps of preprocessing and transformation of text data collections. In the final parts there are reviews of application development, testing and personal view on this thesis.

National Repository of Grey Literature : 11 records found   1 - 10next  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.